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Abstract. We analyze X-ray sources detected over 4.2 pseudo-contiguous sq. deg. in the 0.5-2 keV and 2-10 keV bands down 
to fluxes of 2 x 10~ 15 and 8 x 10~ 15 erg s 1 cm 2 respectively, as part of the XMM-Newton Large Scale Structure Survey. Fhe 
log^-logS in both bands shows a steep slope at bright fluxes, but agrees well with other determinations below ~ 2 x 10~ 14 erg 
s _1 cirr 2 . The detected sources resolve close to 30 per cent of the X-ray background in the 2-10 keV band. We study the two- 
point angular clustering of point sources using nearest neighbours and correlation function statistics and find a weak, positive 
signal for ~ 1130 sources in the 0.5-2 keV band, but no correlation for ~ 400 sources in the 2-10 keV band below scales of 
100 arcsec. A sub-sample of ~ 200 faint sources with hard X-ray count ratios, that is likely to be dominated by obscured AGN, 
does show a positive signal with the data allowing for a large angular correlation length, but only at the ~ 2 (3) cr level, based 
^ . on re-sampling (Poisson) statistics. We discuss possible implications and emphasize the importance of wider, complete surveys 

' in order to fully understand the large scale structure of the X-ray sky. 
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| 1 . Introduction erences therein). In the optical, Wake et al. (2004) have found 

rri • , , , * v „• w,*, xr mat Seyferts in the Sloan Digital Sky Survey at z < 0.2 se- 

. >r With the latest generation of X-ray satellites, XMM-Newton , . , , • • r ^v n r v T i • v .u 

° lected on the basis of their [Oin] or [Nn] emission line strengths 
and Chandra, it has become possible to easily identify ac- ,. , . c .., ... . . ... 

r J J are unbiased tracers of mass, with neither their auto-correlation 

tive galactic nuclei (AGN) and galaxy clusters, and map out t . , t . ... , . , . . - c 

b r properties, nor cross-correlation with galaxies showing signif- 

their distribution to high redshifts (Brandt & Hasinger 2005; 



icant excess above the field. How these results extend to high 
redshift and connect with AGN selected at other wavelengths 
is a subject of intense study. 



Rosati et al. 2002). We are finally in a position to answer ques- 
tions such as: In what environments do AGN preferentially 
form? Are AGN formation and fuelling influenced by large- 
scale structure, or are their properties decided by factors local More recent mea surements between 2 and 10 keV are capa- 
to the AGN and its hosting bulge alone? Is there a dependence ble of probing through increas i ng columns of absorbing mate- 
of AGN obscuring column density on their larger-scale envi- rial associated with the tori of obscured AGN. Since obscured 
ronment? AGN outnumber their unobscured counterparts by a factor of 
Several previous AGN spatial and angular clustering mea- anywhere between 3 and 10 (Maiolino & Rieke 1995; Matt 
surements have been carried out in X-rays and provide a mixed et al 2000) and since x . ray surveys select AGN much more 
picture. The Einstein and ROSAT missions sampled bright and efficiently than at other wavelengths (Brandt et al. 2004), hard- 
typically unobscured AGN populations, resulting in detections band studies (above „ 2 keV) are essential to draw conclusions 
of moderate clustering signatures (e.g., Vikhlinin & Forman from a representative AGN census. Such work has been carried 
1995; Carrera et al. 1998 ; Fabian & Barcons 1992, and ref- out by Yang et al. (2003) with Chandra and Basilakos et al. 
Send offprint requests to: P. Gandhi e-mail: pg@ast . cam .ac.uk ( 2 00 4 ) with XMM-Newton over areas covering 0.4 deg 2 and 
* Based on observations obtained with XMM-Newton, an ESA sci- 2 deg 2 respectively, and both find a significant auto-correlation 
ence mission with instruments and contributions directly funded by signal, possibly associated with the distribution of obscured 
ESA Member States and NASA. AGN. 
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In this paper, we describe initial results on the properties 
and distribution of X-ray-detected AGN in a large survey: the 
XMM-Newton Large Scale Structure survey 1 (hereafter XMM- 
LSS; Pierre et al. 2004). This is a contiguous, wide-area (cur- 
rently ~6 deg 2 ) survey with the primary goals of studying the 
physical properties of cluster/group populations; the impact of 
environment on star, AGN and galaxy formation; and, recip- 
rocally, the effect of star formation activity on cluster prop- 
erties. This is currently the widest, medium-deep survey of a 
contiguous patch of the X-ray sky with spatial resolution better 
than 10 arcsec above 2 keV. Full characterization of the nature 
of the detected X-ray population will be possible with exten- 
sive multi-wavelength follow-up currently under way. In addi- 
tion, the XMM-LSS area and sub-regions have already been (or 
will be) observed as part of numerous large and 'legacy' sur- 
veys at other wavelengths, including the radio (VLA; Cohen 
et al. 2003), optical (CFHT, VLT; Le Fevre et al. 2004), 
near-infrared (UKIDSS 2 ) and mid- to far-infrared wavelengths 
(Spitzer; Lonsdale et al. 2003). Initial follow-up of cluster can- 
didates has proven highly successful. Results include the con- 
firmation of several high-redshift clusters at z > 0.6 (Andreon 
et al. 2005; Valtchanov et al. 2004), extension of the lower- 
redshift (0.3 < z < 0.6) sample to the luminosity regime of 
poor groups and clusters (Willis et al. 2005a,b) as well as com- 
pilation of the highest sky density cluster sample to date (Pierre 
et al. 2006). 

While the full XMM-LSS cluster survey is expected to pro- 
vide sensitive measurements and consistency checks of cos- 
mological parameters (Refregier et al. 2002), more than ~ 80 
per cent of the X-ray sources detected to the flux limits of 
the survey are point sources, predominantly AGN. We present 
the basic properties of the detected point-like sources in the 
XMM-LSS field, including distributions of source flux above 
specific sensitivity limits (§ 2.3) and a measurement of the re- 
solved fraction of the X-ray background (§ 2.4). This is fol- 
lowed by an analysis of the projected two-point correlations 
on the sky of various sub-samples of point-like sources (§ 3), 
including a sample with hard count ratios most likely domi- 
nated by obscured AGN. Lastly, we compare these results with 
previous work and discuss possible implications (§ 4). While 
some source classification and separation is discussed in § 2.2, 
detailed identification and follow-up (still in progress) will be 
presented in future works. 

Analysis of these sources over a sub-region of guaranteed- 
time pointings (hereafter referred to as the 'G pointings') cov- 
ering ~ 3 sq degs has been described by Chiappetti et al. 
(2005). The present paper is an extension of these results to 
cover the point-sources detected in the full area of the XMM- 
LSS pointings observed so far, and to study their two-point an- 
gular correlations. We include an additional 30 pointings (here- 
after referred to as the 'B pointings') observed as part of guest 
observer time. Additionally, the source detection pipeline used 
for our work is different from that of Chiappetti et al. While 
these last authors use well-tested, standard XMM-SAS point- 
source detection algorithms (eboxdetect , emldetect etc.), 

1 http://vela.astro.ulg.ac.be/themes/spatial/xmm/LSS/ 

2 http://www.ukidss.org/ 



the main driver of the XMM-LSS survey to detect faint groups 
and clusters has motivated the development of a custom-built 
wavelet technique with a full profile fitting algorithm in order 
to best distinguish between point-like and extended sources 
(Pacaud et al. 2006), large parts of which we utilize in the 
present work. 

Throughout this paper, the hard band refers to the X-ray 
2-10 keV band; the soft band to 0.5-2 keV; and the term 'hard- 
spectrum' sources refers to sources with a hardness ratio of X- 
ray counts (HR; the relative excess of hard band counts over 
the soft band) greater than -0.2. Where required, we use the 
concordance cosmology (Spergel et al. 2003), unless otherwise 
stated. 

2. The sample of point-like sources in the 
XMM-LSS 

The XMM-LSS observations consist of 19 guaranteed-time (G) 
and 32 guest-observer time (B) overlapping pointings covering 
a total area of ~ 6 deg 2 . The nominal exposure times are 20 ks 
and 10 ks for the G and B pointings, respectively. One G point- 
ing and two B pointings 3 with high flaring background were 
not analysed for this work; the remaining 48 pointings (18 G 
+ 30 B) are listed in Table 1, and the layout of the fields on 
the sky is shown in Fig. 1. Details of the X-ray observations 
are described in Pierre et al. (2004), and complete details of the 
detection pipeline and source classification will be presented 
in Pacaud et al. (2006). The unique feature of the pipeline 
is a custom-built maximum-likelihood profile fitting algorithm 
(Xanim) that runs on a list of initial detections found by first us- 
ing SExtractor (Bertin & Arnouts 1996) on wavelet-filtered, re- 
duced images in each pointing and energy band. This increases 
sensitivity towards faint, extended sources while properly ac- 
counting for Point-Spread-Function (PSF) variation with all 
instrumental and position-dependent effects (e.g., energy, off- 
axis position, bad pixels and CCD gaps). Herein, we present 
results obtained from pipeline runs on the 0.5-2 keV (soft) and 
2-10 keV (hard) band photon images, and refer the reader to 
Pacaud et al. (2006) for the full XMM-LSS pipeline algorithm. 

2.1. Source Detection and Photometry 

For each source, the maximum-likelihood normalization of the 
PSF profile over the local background determines the photom- 
etry in counts, after appropriately masking out any neighbour- 
ing sources by the use of segmentation maps and accounting for 
chip gaps. The fit is carried out over an aperture large enough to 
encompass the bulk of the counts (typically a 70 arcsec box for 
point-sources, and a larger aperture for extended ones, depend- 
ing on source extent). Conversion from count-rate (CR) to flux 
(F) is computed from a combination of the inverse conversion 
factors (CF) for each of the cameras, scaled by the exposure 
times as in Baldi et al. (2002). The conversion factors for in- 
dividual cameras were calculated using xspec (Arnaud 1996) 
and the latest available EPIC response matrices. The thin filter 

3 These pointings have the following XMM-Newton Observation 
IDs respectively: 0109520401, 0037981701 and 0147111501. 



P. Gandhi et al.: XMM-LSS: point source properties and angular correlations 



3 



Field - ObsID Field - ObsID Field - ObsID 



G01 


-0112680101 


G18 


-0111110401 


B15 


-0037981501 


G02 


-0112680201 


G19 


-0111110501 


B16 


-0037981601 


G03 


-0112680301 


B01 


-0037980101 


B18 


-0037981801 


G04 


-0109520101 


B02 


- 0037980201 


B19 


-0037981901 


G05 


-0112680401 


B03 


- 0037980301 


B20 


- 0037982001 


G06 


-0112681301 


B04 


- 0037980401 


B21 


-0037982101 


G07 


-0112681001 


B05 


- 0037980501 


B22 


- 0037982201 


G08 


-0112680501 


B06 


- 0037980601 


B23 


- 0037982301 


G09 


- 0109520601 


B07 


- 0037980701 


B24 


- 0037982401 


G10 


- 0109520201 


B08 


- 0037980801 


B25 


- 0037982501 


Gil 


- 0109520301 


B09 


- 0037980901 


B26 


- 0037982601 


G13 


-0109520501 


B10 


-0037981001 


B27 


- 0037982701 


G14 


-0112680801 


Bll 


-0037981101 


B28 


-0147110101 


G15 


-0111110101 


B12 


-0037981201 


B29 


-0147110201 


G16 


-0111110701 


B13 


-0037981301 


B30 


-0147111301 


G17 


-0111110301 


B14 


-0037981401 


B31 


-0147111401 



Table 1. The list of XMM-Newton pointings (field label and 
Observation ID) considered for the present analysis. The nominal 
durations of the G (guaranteed time) and B (guest observer) point- 
ing exposures are 20 and 10 ks respectively. Additional details at 
http://cosmos.iasf-milano.inaf.it/~lssadmin/Website/LSS/Anc/ 



EPIC Camera 0.5-2 keV 2-10 keV 
MOS 4.990 x 10" 12 2.296 x lO" 11 
pn 1.460 x 1Q-' 2 7.912 xlQ-' 2 

Table 2. The count-rate-to-flux conversion factors for the individual 
EPIC cameras and energy bands, stated in units of erg s -1 cm" 2 for 
a rate of 1 ct s" 1 . A photon-index power-law of T = 1.7 affected by 
Galactic absorption of 2.6 x 10 20 cirT 2 was assumed. Both MOS cam- 
eras were assumed to be identical. 



response was considered, as for the actual observations. The 
model used was an intrinsic power-law T = 1.7 affected by 
Galactic absorption of Nh = 2.6 x 10 20 cirT 2 , appropriate to the 
XMM-LSS sight-line. The individual CFs are listed in Table 2. 

2.2. Source selection and classification 

Intensive follow-up of Chandra and XMM-Newton surveys has 
shown that the vast majority of X-ray detections at high galactic 
latitudes are associated with AGN, in deep as well as in shallow 
surveys (Brandt et al. 2004; Szokoly et al. 2004; Nandra et al. 
2004). The optical identifications of the associated counterparts 
are varied, and include Seyferts as well as late and early-type 
galaxies. Despite the fact that these do not always show obvious 
optical signs of AGN activity, nevertheless the presence of a 
powerful accretion source is indisputable, based on the inferred 
power of the X-ray source, and, in many cases, the detection of 
weak, high-ionization emission lines. In the XMM-LSS survey 
as well, the bulk (more than ~ 80 per cent) of X-ray detections 
will be due to AGN (compare with, e.g., Crawford et al. 2002). 
Stars and bremsstrahlung emission from galaxy clusters will 
constitute the remaining sample. 

Obvious stars were removed by cross-correlating our X-ray 
detections with the USNO survey (Zacharias et al. 2004) and 
identifying all optical point-like sources with a B-band mag- 



nitude brighter than 13. This cross-correlation resulted in 17 
sources being removed from the 0.5-2 keV sample, and these 
are not included in any of the following analysis. No stars coin- 
cided with a hard-band X-ray source above the nominal signif- 
icance threshold (see below). Though there will undoubtedly 
remain some contamination due to fainter non-AGN point-like 
sources, the ROSAT deep surveys have found that stars rep- 
resent only a minority of all X-ray detections (Schmidt et al. 
1998) and typically display soft X-ray spectral slopes. Thus, 
our results should not be affected significantly by these con- 
taminants, especially in the hard-band. 

In the process of identifying clusters and groups (hereafter, 
jointly referred to as clusters), extensive simulations are being 
carried out to determine the best X-ray classification param- 
eters, based on already published clusters (Valtchanov et al. 
2004; Willis et al. 2005a,b, amongst others). For the purpose 
of the present work, we take as 'extended', only those soft- 
band (0.5-2 keV) sources that have already been followed-up 
at other wavelengths and confirmed to be clusters, or found to 
have a high probability of being associated with such systems, 
based on i) optical-spectroscopy; ii) X-ray source profile ex- 
tent; and, iii) the presence of a red sequence of galaxies (see 
Pacaud et al. 2006 for full details). At the time of writing, this 
consists of 28 confirmed clusters, and 28 clusters with provi- 
sional spectroscopic redshifts over the full 5.7 deg 2 of Fig. 1. 
The rest of the sources in the soft band are classified as 'point- 
like' . In the 2-10 keV (hard) band, we treat all sources as being 
point-like. 

For the analysis presented in this paper, only those sources 
that lay within 10 arcmin of the optical axis centres of each 
pointing were retained. This was done in order to minimize bi- 
ases due to PSF distortion at large off-axis angles and possible 
confusion due to the same source being detected on adjacent 
pointings. This results in pseudo-contiguous coverage of the 
field, with holes in between neighbouring pointings (that are 
fully accounted for in our analysis; see below). 

Finally, the significance of source detection was estimated 
by the value of signaknoise (S/N). The 'signal' used is that 
corresponding to 68 per cent of the full background-subtracted 
counts for each source, and the 'noise' assumes Poisson errors 
on the total counts (source+background) within a fixed aper- 
ture of radius of 17 arcsec. Strictly speaking, this is consistent 
only for on-axis point-sources 4 , though the effect of the off- 
axis PSF degradation is mitigated somewhat by our restriction 
to the central 10 arcmin regions of each pointing. To compute 
the Poisson error, we specifically use Eq. 7 of Gehrels (1986) 
for the lcr upper-limit on the noise. Most results in the follow- 
ing sections are presented for the sample with S/N>3, but we 
also consider an effectively-fainter sample with S/N>2 in § 3.4, 
and briefly mention results relevant to sub-samples with higher 
S/N thresholds of4and5. 



4 http://xmm.vilspa.esa.es/external/xmm_user_support/document- 
ation/uhb/node 1 7 .html 
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_i i i I i i i i I i i i i I i i i i I i i i i I i i i_ 

37.0 36.5 36.0 35.5 35.0 

RA (degs) 

Fig. 1. The layout of the 48 pointings of the XMM-LSS survey. Smoothed 0.5-10 keV photon images for all cameras have been coadded 
with the same scaling in counts for display (without exposure map correction). Each pointing has a full field-of-view diameter of just under 
30 arcmin. The placement is such that most adjacent pointings overlap beyond -10 arcmin from the respective optical axis centres (the pointings 
are labelled with their field names [compare with Table 1] at approximately these centres). In the present work, we restricted the analysis to 
the respective central 10' regions, outlined for three pointings only (for clarity) at the top right as the 10' -radius large circles. The limiting 
sensitivity varies as a function of the exposure times, background level and off-axis angles. Many of the brighter X-ray sources are easily 
discerned in black. The longer-exposed guaranteed time pointings (G) are in the southern part of the survey; the rest (B pointings) being 
obtained during guest observer time. 
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Selection Criterion 


Classification 


0.5-2 keV 


2-10 keV 


S/N>3 (B+G) 


Point sources 


1134 


413 


// 


Extended sources 


36 


0" 


// 


Stars 


17* 





S/N>2 (B+G) 


Point sources 




912 


S/N>2 (G) 


Point sources 




473 


// 


l>HR>-0.2 




209 


// 


l>HR>-0.2 




140 



Table 3. Sizes of various sub-samples of X-ray sources analyzed in 
this paper. For the S/N threshold of 2, only hard-band sources are an- 
alyzed in § 3.4. The parentheses in the first column specify whether 
the sample was selected over the whole area (B+G pointings), or only 
over the deeper G pointings. Notes: "All sources were considered as 
point-like in the 2-10 keV band. If we relax this condition, only 8 
sources are affected. *Only obvious stars with B < 13 were identified 
(see text), and these are counted separately from 'point sources'. 



2.3. Results: Sky coverage and logN-logS 

Sensitivity maps for each pointing were constructed by using 
the background measured by the source detection pipeline. The 
minimum number of counts necessary at any spatial pixel on 
a pointing (above the local background) in order for a source 
to have a S/N matching the adopted threshold is computed, 
and converted to a limiting-flux using the exposure maps and 
conversion-factors. The sky coverage is computed by summing 
the area covered by these sensitivity maps as a function of 
limiting-flux, and is shown for the hard and soft bands in Fig. 2, 
for the nominal S/N threshold of 3. The longer-exposed guar- 
anteed time (G) pointings have a deeper sensitivity, but smaller 
overall coverage than the guest observer (B) pointings. The 
maximum coverage at the brightest fluxes is close to 4.2 deg 2 ; 
overlap between adjacent pointings in this selected area of the 
central 10 arcmin regions is negligible (only 0.3 per cent of the 
total). 

Above a S/N threshold of 3, we find a total of 1 134 and 413 
point sources in the soft and hard bands respectively (Table 3 
lists the numbers of sources in various bands and with differ- 
ent selection criteria as discussed in the text). The logAMogS 
in each band (also shown in Fig. 2) was computed by identi- 
fying all sources with a detected flux above any given value S 
and summing the inverse areas over which these sources could 
have been detected, as measured from the sky coverage. The 
flux distributions of the detected sources peak at ~ 5x 10~ 15 and 
~ 1.5 x 10~ 14 erg s _1 ctrT 2 in the soft and hard bands respec- 
tively, and at these flux levels, the logA^-logS is in excellent 
agreement with other published surveys (e.g., we show the fit 
of Baldi et al. 2002 in Fig. 2). The faintest fluxes detected (typ- 
ically near the optical axes) are ~ 2 and 8 x 10~ 15 erg s _1 cm -2 
in the two bands, respectively. At the lowest fluxes in the soft 
band, our logAMogS begins to drop off due to incompleteness, 
while at bright fluxes in both bands, our logAMogS is lower 
than (but consistent with the lcr lower limit of) other surveys. 
This difference was also observed by Chiappetti et al. (2005) 
over the area of the G pointings, based on a completely differ- 
ent source detection procedure (their logAMogS is also shown 
in Fig. 2). We find excellent agreement with Chiappetti et al. 
(2005, especially in the soft band) and refer to their paper for 



power-law fits and discussion of the logN-logS. The agree- 
ment gives us confidence that the noted deficit is intrinsic and 
not due to pipeline systematics. 

2.4. Results: Resolved fraction of the X-ray 
background 

Only with the new generation of X-ray satellites has the hard 
cosmic X-ray background radiation been resolved substantially 
into discrete sources (a combination of obscured and unob- 
scured AGN), supporting the basic tenet of AGN Unification 
(Setti & Woltjer 1989; Mushotzky et al. 2000). While the soft 
X-ray background below 2 keV can be accounted for almost 
completely as a combination of emission from Galactic emis- 
sion, AGN and clusters (e.g., Fabian & Barcons 1992), the ex- 
act fraction resolved out in the hard band remains a contentious 
issue. There is even intriguing evidence that a new population 
of obscured AGN remains undiscovered in even the deepest 
surveys (De Luca & Molendi 2004; Worsley et al. 2004). 

We compute the resolved intensity of detected sources by 
summing over the flux of each source dividing by the inverse 
area over which the source would have been detected, simi- 
larly to the computation of the logA^-logS . The result is shown 
in Fig. 3 and tabulated in Table 4. At the nominal flux limit 
of the XMM-LSS survey (with S/N>3), we resolve close to 
30 per cent of some latest measurements of the 2-10 X-ray 
background (De Luca & Molendi 2004; Hickox & Markevitch 
2006). This matches well with the level resolved in other sur- 
veys over our flux regime (e.g., Manners et al. 2003). Note that 
hard-spectrum sources (discussed further in § 3.4) contribute a 
larger fraction at fainter fluxes due to their increasing domi- 
nance, relative to those with softer X-ray spectra. 

3. Angular Correlation Statistics 

Full clustering analysis requires spectroscopic redshifts, since 
even optimistic photometric redshift errors of Az ~ 0. 1 trans- 
late into typical physical separations of hundreds of Mpc, 
washing out any intrinsic clustering signal. Until the on-going 
multi-wavelength follow-up of X-ray sources produces a useful 
number of accurate redshifts, we restrict our study to the areal 
distribution of sources only. 

Angular clustering is related to the excess probability of 
finding source pairs at any given angular separation 6, rela- 
tive to a sample distributed with uniform probability. A variety 
of methods have been used for studying clustering properties 
of astronomical surveys, including power spectrum (Webster 
1976), counts-in-cells (e.g., Carrera et al. 1998) and fractal 
analyses (Joyce et al. 1999), each of which can be imple- 
mented either in real, or in projected space. Here, we describe 
results obtained from nearest-neighbours and correlation func- 
tion statistics for the point-like sources in the XMM-LSS. 

3.1. Generation of random (uncorrelated) catalogues 

The most straightforward way to account for instrument and 
pipeline selection effects in the survey (e.g., off-axis vignetting, 
holes between adjacent pointings, chip gaps) is to simulate an 




Fig. 2. The sky coverage (left) and log^-logS (right) of the XMM-LSS sample within the central 10 arcmin-radius pointing regions, for the 2- 
10 keV (top) and 0.5-2 keV (bottom) bands, for a threshold S/N>3 in both bands. The sky coverage is shown separately for the guaranteed time 
(deeper; dots-dashed; marked 'G') and guest observer time (shallower; dashed; marked 'B') pointings. The logJV-logS is shown for all sources 
(clusters have a minor contribution, except at bright fluxes in the soft band; their contribution is shown as the triangles; marked 'extended'). The 
errors on the logA'-logS' denote lcr Poisson uncertainties on the independent differential count bins, subsequently scaled to the integral counts. 
The results of the HELLAS 2XMM survey (Baldi et al. 2002) and XMM-Newton Medium Deep Survey (XMDS; Chiappetti et al. 2005) are 
also overplotted for comparison. 



S/N threshold 


Faintest flux 


Median Flux 


XRB sky intensity 




Resolved fraction (per cent) 




erg s -1 cirr 2 


erg s -1 cirr 2 


erg s~' cirr 2 deg~ 2 


All sources 


Hard-spectrum sources Soft-spectrum sources 


S/N> 3 


8.5 x 10~ 15 


2.6 x 10~ 14 


6.6 x 10~ 12 


28 


12 16 


S/N>2 


5.1 x 10~ 15 


1.8 x 10~ 14 


7.7 x 10~ 12 


33 


16 17 



Table 4. Resolved fraction of the X-ray background over the B+G pointings for sub-samples with different S/N thresholds. The resolved 
fraction is stated as a percentage of the total intensity of 2.4 x 10"" erg s -1 cirr 2 deg~ 2 found by De Luca & Molendi (2004). The resolved 
fraction is further sub-divided into the contribution of hard- spectrum and soft-spectrum sources (see § 3.4). 



ensemble of catalogues over randomly-chosen sky positions 
that also correctly accounts for these effects. Source fluxes were 
randomized to have an overall distribution similar to the data. A 
random flux (S r ) can be sampled from the data log/V-logS by 
choosing a random number (p) uniformly distributed between 
and 1 , and searching for the flux S r that satisfies the following 
transformation: 

N(>S r )=N(>S lim )x[l-p] (1) 

where N(> S r ) is the value of the log/V-logS at S r and S u m 
is the limiting flux above which the log/V-logS is defined (see 
also Mullis et al. 2004). Sky positions for these random cata- 
logues were assigned with equal probability within the central 
10' regions of the pointings. If the limiting-flux at the assigned 
coordinates was larger (worse) than the randomly-chosen flux 
for the source, the source was discarded, and another one ran- 
domized in its place. We verified that the log/V-logS distri- 
butions of the random catalogues were similar to those of the 
parent catalogues used to generate them in each band and with 
each selection criterion. 

This procedure assumes that a source with a randomly gen- 
erated flux S r would have been detected with exactly that flux 
at the assigned sky position. In reality, any detection procedure 
will introduce shot noise, which results in some faint sources 



below the nominal threshold still being detected due to Poisson 
uncertainties 'boosting' their counts. On the other hand, some 
bright sources may have their counts 'depressed' and thus lie 
below the detection threshold. We simulated this by convert- 
ing the flux S r to counts (with the use of local exposure maps 
and conversion-factors) and drawing a random number from 
the Poisson distribution of the total counts (source+local back- 
ground). The source is then retained in the final random cata- 
logue with a probability that the Poisson distribution exceeds 
the total limiting counts required by the S/N threshold. This 
simulation of shot noise was used for all random catalogues in 
each of the bands and sub-samples tested. 

The above manual introduction of shot noise was neces- 
sary because full simulations of the entire dataset proved to be 
unrealistic. Still, as a cross-check, we did carry out full sim- 
ulations of three pointing exposures by randomly generating 
sources with appropriate off-axis and energy-dependent PSFs 
for each camera, along with simulated backgrounds in accor- 
dance with Read & Ponman (2003), all scaled for the respec- 
tive pointing exposure times. Our full detection pipeline was 
then run over these. We verified that the statistical distribution 
of sources over these simulated pointings compares well with 
the random catalogues above. 

With an XMM-Newton Mirror Module PSF characteristic 
size of ~ 6 arcsec, there is a small, but finite probability that 
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Fig. 3. Cumulative X-ray background intensity in the 2-10 keV band 
for the detected point sources in the full field of the XMM-LSS survey. 
Arrows denote the faintest flux limits of sub-samples compiled with 
S/N thresholds of 3 (right arrow) and 2 (left arrow), respectively. The 
dashed line is the total X-ray background measurement reported by De 
Luca & Molendi (2004); the dotted line is the measurement of Hickox 
& Markevitch (2006) converted to 2-10 keV assuming a power-law 
with photon index T = 1.4. The resolved source contribution is sub- 
divided into soft-spectrum and hard-spectrum sources (discussed in 
§ 3.4). 



blending of two close, neighbouring sources could result in 
them being classified as a single, extended source. The XMM- 
LSS pipeline has been designed to optimally detect and probe 
the evolution of extended sources, and we thus make use of its 
strength of separating point-like sources from extended clus- 
ters. We studied the effect of blending through several sets of 
simulations and confirmed that our pipeline is able to resolve 
close-by pairs successfully, though the efficiency of doing so 
depends on source flux as well as on pair separation. While no 
sources will be resolved at separations of less than ~ 6 arcsec, 
the pipeline resolves effectively all source pairs at separations 
of > 30 arcsec. Based on the simulations and observed source 
counts, the efficiency of pair resolution at a separation of ~2Q 
(10) arcsec was determined to be ~70 (30) per cent. Further 
details will be presented in Pacaud et al. (2006), but for our 
purposes, we simply remove a corresponding fraction of such 
close pairs from the random catalogues in order to simulate this 
blending. 

A hundred simulated catalogues are generated for each 
band and sub-sample studied, each with the same number of 
sources as the parent data catalogue. The average number of 
data and random pairs in the ensemble are then counted for the 
computation of correlation statistics. 

3.2. Results: Angular Nearest Neighbours Statistic 

First, we show the distribution of projected separations of 
point-like sources from their first nearest neighbours (NN). 
This is shown as the cumulative (normalized) distribution in 
Fig. 4, compared to the average distribution of nearest neigh- 
bours for 100 random catalogues, in both the soft and the 



Fig. 4. Cumulative nearest-neighbour distribution function (statistic) 
for the soft (bottom) and hard (top) bands for the point-sources with 
S/N>3 (filled circles) compared to the average statistic of 100 random 
catalogues. The plot shows the fraction of sources with an angular 
separation less than any given 6 to their respective nearest neighbours. 
The soft band distribution has an excess of pairs compared to random, 
as opposed to the hard band sample. 



hard bands. An excess of nearest neighbours is observed in the 
soft band (bottom plot) below ~100 arcsec. A Kolmogorov- 
Smirnov (K-S) test returns a small probability of » 10~ 3 for 
the null hypothesis that the two distributions are identical, im- 
plying possible clustering in this band. In the hard band, no 
such excess is observed, and the K-S probability is consistent 
with the data and random distributions being drawn from the 
same population. 

3.3. Results: Angular Correlation Function 

Optimal estimators are widely used to quantify the overall ex- 
cess of data-data pairs over random-random ones at differ- 
ent scales: we chose to use the Hamilton estimator (Hamilton 
1993) for the angular correlation function (ACF), but found 
very similar results using others, e.g., that of Efstathiou et al. 
(1991). Excess clustering compared to a uniform distribution is 
parametrized in terms of oj(6), defined as 



; DD(6)RR(6) 

(x)(6) = f 1 

J DR(6)DR(6) 



(2) 



where DD, RR and DR are the number of data-data, random- 
random and data-random pairs at separation 6, all subjected 
to the survey selection effects. The normalizing factor / is 
4N D N R /(N D - 1)(N R - 1), where N D and N R are the number of 
sources in the data and random catalogues, respectively. Source 
pairs were binned in equal logarithmic intervals of 9: the bin 
sizes being chosen to include at least ~20 pairs in each bin in 
order to minimize the effect of small-number statistics. This 
restriction also defined the minimum pair-separation bin over 
which <jj is plotted and fitted: this typically lies between 20- 
50 arcsec. In any case, the overall results presented below are 
not sensitive to reasonable binning choices. 
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We show the ACF results for point-like sources in Fig. 5. 
The plotted error-bars are Poisson 1-cr uncertainties, calculated 
for each bin as (1 + o>)/ y/DD. In the soft band, there is an 
overall positive auto-correlation signal at most scales smaller 
than ~1000 arcsec. The amplitude of this correlation is small: 
if we characterize w as a power-law of the form 



co(6) = (0 o /0) 



(3) 



we find 9q = 6.3 ± 3 arcsec, with a slope of y — 2.2 ± 0.2, with 
the quoted errors being appropriate to 68 per cent confidence 
intervals for one interesting parameter. By simply counting the 
excess number of data-data pairs compared to random-random 
ones with all separations less than, say, 200 arcsec, we find 
an excess at the 2.3cr level. This result was measured for the 
sample of 1134 point-like sources alone. The auto-correlation 
signal is almost identical for the combined catalogue of 1170 
soft-band detections, including the 36 extended sources, sug- 
gesting that there is no obvious and strong cross-correlation 
signal between the extended and point-like samples; however, 
this will be studied in detail once a larger, complete sample of 
extended sources is compiled. 

In terms of numbers of pairs, we detect 58 (251) indepen- 
dent data pairs with separations of < 100 (200) arcsec, while 
the random catalogues contain 58 (230) pairs over the same 
scales, on average. This result is consistent with the null hy- 
pothesis implied by the nearest neighbour distribution. Though 
statistics are small, we find similar null results for sub-samples 
of hard-band sources selected with S/N>4 (209 sources) or 
with S/N >5 (123 sources). 

We note that bias (underestimation of oS) related to the inte- 
gral constraint (the fact that a finite sky area with an unknown 
source density is used to estimate the correlation signal) is neg- 
ligible for our field. Assuming the power-law form of Eq. 3 
for the intrinsic correlation function, the bias can be estimated 
by numerical integration of 6^' /Q 2 f ( Q l ~ y d£l\dQ.2 over the 
whole area (Q) of the survey. For a range of relevant power- 
law parameters, we find this bias to be ~ 0.01, a quantity suffi- 
ciently small to ignore during fitting. 

3.4. Results: Sources with hard X-ray spectral indices 

The above result on the non-detection of clustering in the 2- 
10 keV band seems to be at odds with the findings of Yang 
et al. (2003) and Basilakos et al. (2004). In both these works, 
excess clustering was detected in the hard band, and associated 
with the clustering of obscured AGN, which should constitute 
a larger fraction of the hard-band detections as compared to 
the soft band. We have made predictions for the fraction of 
AGN detected in the XMM-LSS survey that are expected to 
be obscured (see § 4.2 below) and find that, at the flux limits 
probed by our sample, obscured AGN will represent approxi- 
mately 30^-0 per cent of the hard band detections, and about 
10 per cent of the soft band sample; this fraction increases with 
decreasing flux levels. Obscured AGN can be efficiently (but 
not uniquely) selected by computing the hardness ratio (HR) 



OA 
0.3 
0.2 
0.1 
0.0 




100 1000 
Pair Seporotion 6 (arcsec) 



Fig. 5. The ACF, as defined by Hamilton (1993) and measured for the 
XMM-LSS survey in the soft (bottom) and hard (top) bands for the 
samples with S/N>3. The solid curve is the best-fit power-law model 
(shown for the soft band only), while the dotted line marks co = 0. The 
y-axes are plotted on a linear-scale to aid visualization of the signifi- 
cance of any correlation and the axes ranges are kept the same in both 
bands for comparison. Previous power-law ACFs of Basilakos et al. 
(2004, for the hard band) and of Vikhlinin & Forman (1995, for the 
soft band) are shown as the dashed and dot-dashed lines respectively. 

of source counts in the hard (H) and the soft (S) bands. If we 
define 



HR = 



H-S 
H + S' 



(4) 



then a large fraction of sources with HR>-0.2 (hereafter, 'hard- 
spectrum sources') are likely to be obscured AGN. This limit 
corresponds to an obscuring column density of approximately 
10 22 ctrr 2 due to cold gas local to a source with an intrinsic 
power-law photon index of 1.7 at a redshift z = 0.7, and has 
been often used in the literature to separate intrinsically ob- 
scured AGN from unobscured ones (e.g., Gandhi et al. 2004; 
Padovani et al. 2004). 

Applying the above HR criterion to the hard band sample 
of 413 sources results in only 133 hard-spectrum sources over 
the B+G pointings, which is not sufficient for a proper corre- 
lation analysis. Since obscured AGN begin to emerge at the 
faintest fluxes, it is possible to probe them in larger numbers 
by decreasing our significance threshold for source detection. 
We therefore searched for detections with S/N>2, and found a 
total of 912 sources in the 2-10 keV band, of which 409 have 
HR>-Q.2. But only a marginal correlation signal was detected 
for these. 

We note, however, that there is a large variation in the 
exposure-times and limiting fluxes of the individual pointings, 
especially between the B and G pointings, which could bias 
results due to varying efficiency of selecting obscured AGN 
across the field. In order to have a more uniform coverage, we 
then restricted our source selection to the deeper G-pointings 
only. The total number of 2-10 keV detections with S/N>2 over 
the G pointings is 473. Of these, 400 have unique counterparts 
in the 0.5-2 keV band within a threshold inter-band distance 
of 10 arcsec, and 140 of these are hard-spectrum sources with 
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Fig. 6. Hardness ratio (HR) vs counts (Hard+Soft; i.e. H + S) of the 
G pointings sample of sources with S/N>2 in the 2-10 keV band with 
either a unique counterpart (400 black dots with error bars) or no coun- 
terpart (69 arrows) in the soft band. The horizontal dashed lines mark 
the hardness ratios of a power-law source at z = 0.7 with photon- 
index r = 1.7 observed on the pn CCD, and obscured by various 
columns of gas (labelled as N H ) local to the source, in addition to the 
constant Galactic column. We take all sources with HR>-Q.2 to be 
'hard-spectrum' sources. 



HR>-0.2 (Fig. 6). In order to include the hardest sources, no 
S/N selection was imposed on the soft-band sample for this 
cross-correlation. Additionally, 69 sources (of 473) have no 
counterpart in the 0.5-2 keV band, and are likely to be very 
highly obscured (Af H >10 23 cirT 2 or, possibly, Compton-thick) 
sources, if they are non-spurious detections (see § 4.2). We 
combine the sub-samples of the 140 detections [with 0.5-2 keV 
counterparts] and of 69 detections [without counterparts], giv- 
ing a total of 209 hard-spectrum sources. 

The nearest neighbours (NN) distribution of these hard- 
spectrum sources shows departure from a uniform distribution 
at the 99.4 per cent probability level (based on the K-S test). 
The log/V-logS and auto-correlation function (ACF) of these 
209 sources is plotted in Fig. 7. a> is clearly positive in the first 
two bins, representing a 3. Oct excess assuming Hamilton's for- 
mula (or 3.4<r assuming that of Efstathiou et al.). Though a 
power-law is not a good description of the signal, a full fit with 
the model of Eq. 3 results in a normalization #o = 42^] 3 arcsec 
and slope y = 3.1*H (uncertainties are, again, those relevant 
for one parameter of interest based on Poisson errors). 

A similar auto-correlation analysis of the 260 sources with 
soft spectral count ratios (HR<-0.2) shows no such excess at 
scales less than 100 arcsec. We also note that the correlation 
signal of hard-spectrum sources is not dominated by the very 
hardest sources alone (the arrows in Fig. 6). Though the sample 
size is small, the NN statistic of only the 140 hard-spectrum 
sources with soft-band counterparts gives a K-S null hypothesis 
probability of only 10~ 3 compared to 100 random catalogues, 
again suggesting departure from a uniform distribution. 

The main results of the correlation analysis on various sub- 
samples described above are summarized in Table 5. 



Selection Criteria 




K-S 


ACF 


S/N>3 (B+G) 


0.5-2 keV 




0.001 


O = 6.3" ± 3; 










y = 2.2 ± 0.2 


S/N>3 (B+G) 


2-10 keV 




0.55 




S/N>2 (G) 


; 1 >HR>- 


-0.2 


0.006 


0o =42'_'+ 3 7 ; 










y = 3 1 +L1 

" -'-0.5 




; 1>HR> 


-0.2 


0.001 





Table 5. Basic results of the auto-correlation analysis for various sam- 
ples. 'K-S' refers to the null hypothesis probability of the data and 
control samples being drawn from the same distribution. Power-law 
fits (Eq. 3) to the ACF are listed in the final column, where computed 
or found to be significant. 



3.5. On the significance level of observed correlations 

Since the above constraints on clustering are relatively weak (a 
characteristic of angular clustering studies of sparse samples), 
it is pertinent to examine the significance levels quoted. 

Initial tests above computed the distribution function of 
nearest neighbour separations, and the results were found to be 
consistent with the strength of the angular correlation function, 
as estimated by simple pair-counting over relevant scales. The 
Poisson errors used in the ACF fits, however, are strictly valid 
only for uncorrected data. Bootstrap re-sampling (Barrow 
et al. 1984) is widely used to assess the internal reliability of 
a correlated dataset. Yet, as shown by Fisher et al. (1994), for 
sparse samples, this can over-estimate the true uncertainties by 
factors of ~ 2 (and up to 4). For uncorrected, and weakly cor- 
related samples, such as ours, Poisson errors approximate the 
true errors, despite being a lower-limit. This approximation is 
likely to break down for the sample of hard-spectrum sources 
which shows the largest deviation from a random distribution. 
In this case, we re-sampled the entire ensemble of random cat- 
alogues a large number of times (> 50) with replacement in or- 
der to compute the bootstrap errors. We find 8o = 42" + 22 and 
y = 3.1 + 1.3, implying ~ 2cr constraints on both the normaliza- 
tion and the slope of the correlation function fit of the previous 
section (errors denote dispersion amongst the re-samples). 

Since the bins used in the ACF analysis are themselves cor- 
related at different scales, we compute the covariance matrix 
Cov(0,, 6j) of u> returned by the above bootstrap method be- 
tween all pairs of bins (#,, Of) used for the fit. The correlation 
matrix is then calculated; it is simply the covariance matrix 
scaled to the diagonal elements as follows: 

Corr(6»i, Of) = Cov(<9„ 0f)l ^Cov(0,-, 6»,) Cov(0j, Of) 

To estimate the strength of the off-diagonal correlations, we 
follow Scranton et al. (2002) and form the scaled product of 
elements in each row (or column) i of the correlation matrix: 
P(j!) = nf =1 |Corr(0„ r )\ l/N . This product will be equal to one 
in the case of perfect correlation, while we find P ^ 0.2 at all 
scales of the correlation matrix, indicative of relatively small 
correlations. Thus, the 'true' significance level for the ACF of 
the hard-spectrum sources is likely to be straddled by the 2- 
(bootstrap) and 3- (Poisson) cr levels, as calculated above. 
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Fig. 7. logA^-logS (top) and ACF (bottom) for the 2-10 keV sample of 
209 hard-spectrum (HR>~0.2) sources with S/N>2 in the G pointings. 
In the log,/V-logS plot, the dot-dashed line shown for comparison is 
the best-fit to the full 2-10 keV XMDS logJV-logS , which also traces 
the slope of our full sample of hard-band sources (see Fig. 2). In the 
ACF plot, notice the larger range of the y-axis compared to Fig. 5. The 
dashed line is the best-fit ACF of Basilakos et al. (2004). 



4. Discussion 

The XMM-LSS survey is the widest, medium-deep, high 
galactic-latitude X-ray survey carried out by XMM-Newton. 
The survey currently covers a full area of 5.7 deg 2 and is de- 
signed to provide the best constraints on X-ray detected clusters 
and their evolution out to z = 1 . The wide, contiguous coverage 
also gives ample opportunity to study the distribution of AGN 
in this field. 



4.1. Comparison with other works 

We find a positive two-point angular clustering signal with 
small correlation length in the 0.5-2 keV band, consistent 
within the errors with previous measurements using similar 
analysis (Vikhlinin & Forman 1995; Basilakos et al. 2005). 
We also note that the value of the projected correlation scale 
that we found is close to the size of the XMM-Newton PSF 
(~ 6"), implying that some level of amplification bias may ar- 
tificially result in an overestimation of the true clustering scale. 
The finite PSF can lead to confusion and an effective smoothing 
of the real source distribution, resulting in a larger correlation 
length for the density peaks of the observed, smoothed distri- 
bution (Kaiser 1984). We do not expect this bias to have a ma- 
jor effect, though, because the areal density of sources in any 
spatial resolution element (PSF) is small in relatively-shallow 
surveys such as ours (see also Basilakos et al. 2005). 

In the hard (2-10 keV) band, we do not detect any cluster- 
ing signal of the projected source distribution on the sky, unlike 
results by Yang et al. (2003) and Basilakos et al. (2004). The 
main differences of our survey with respect to the other works 



are: i) the analyzed region covers 4.2 deg 2 which is at least a 
factor of two larger than that of Basilakos et al. (2004) and ~ 10 
times larger than Yang et al. (2003); ii) this coverage is con- 
tiguous (or at least pseudo-contiguous, covering 83 per cent of 
the area of a 2 x 2.5 deg 2 rectangle); Hi) in terms of flux, while 
we probe sources slightly deeper than the nominal threshold 
hard band flux level of ~ 10" 14 erg s" 1 ctrT 2 of Basilakos et al. 
(2004), the Chandra survey of Yang et al. (2003) probes down 
to 3 X 10" 15 erg s _1 ctrT 2 , approximately 2-3 times fainter than 
us 5 . 

Are we then simply seeing a 'truer' picture of the distribu- 
tion of AGN in X-rays as sky coverage is increased and im- 
proved? Or is the XMM-LSS area special in some way? We 
first note that logA^-logS of the XMM-LSS field is slightly 
lower than other results published in the literature at fluxes 
brighter than ~ 2 x 10~ 14 erg s _1 ctrT 2 in both bands. The 
field for the XMM-LSS survey was explicitly defined so as to 
avoid previously-known, bright X-ray sources and the deficit 
seen could simply reflect cosmic variance in the X-ray sky. 
Indeed, cosmic variance is known to cause uncertainties in the 
normalization of the cosmic X-ray background level of ~30 
per cent above 2 keV (Cowie et al. 2002; Barcons et al. 2000). 
Fluctuations in the HEAO 1 A-2 X-ray background map on 
scales of a few degrees have been observed (Boughn et al. 
2002; Fabian & Barcons 1992) and it may be plausible that our 
field sits in a comparatively large 'void' of large scale struc- 
ture. The fraction the hard X-ray background resolved out in 
our field is, however, consistent with that found by other sur- 
veys (§ 2.4). Yang et al. (2003) also noted that cosmic variance 
manifests as 'voids', but on scales much less than a sq. deg. 

Due to the sharper Chandra PSF and the deeper flux limit 
of their survey, the positive result of Yang et al. (2003) may 
be understood in terms of their much better sensitivity to de- 
tect obscured AGN and any associated clustering signal. Not 
enough details are available for an assessment of any other dif- 
ferences (such as pipeline selection effects, differences in ACF 
fitting etc.) with respect to Basilakos et al. (2004). But a crude 
comparison shows a definite deficit of source pairs at small sep- 
arations in our field. Using their Fig. 1 and Eq. 1, we infer that 
their fields contain ~ 9 source pairs with a separation of be- 
tween 50 and 60 arcsec over their area of « 2 deg 2 at a similar 
(or slightly higher) flux limit, while we detect exactly 9 such 
pairs over our full area of 4.2 deg 2 . Another difference is that 
we use the average logA^-logS of the XMM-LSS field itself to 
generate the fluxes of the random catalogues, while they use 
the distribution from another field (that of Baldi et al. 2002), 
though their data is consistent with the distribution used (at 
least at fluxes brighter than 5 x 10~ 14 erg s _1 cirT 2 ; see their 
Fig 1). We also note that Basilakos et al. (2004) do not include 
shot noise (§ 3.1) in the measurement of their random fluxes, 
but we found that switching off this extra Poisson noise dur- 
ing the generation of our random fluxes had only minor effects 
on the determination of the ACF. Finally, any bins with a nega- 



5 Both quoted limits are in the 2-8 keV band. The corresponding 
flux limit in 2-10 keV should be « 20 per cent brighter, assuming 
a photon-index T = 1.7; the difference in band definitions is not a 
dominant source of discrepancy. 
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tive cd contribution at small scales (which often occur in sparse, 
weakly-clustered samples) are neglected by them. Full source 
characterization and follow-up in our field should help to dis- 
cern the correct cause of the observed differences. 

4.2. Selection and correlation of obscured AGN 

We have made predictions for the expected distribution of HR 
that would be observed for AGN spread in redshift and affected 
by different columns of intrinsic obscuring gas. We assumed 
that our sample follows a luminosity function (XLF) and ob- 
scuring column density (Nh) distribution as calculated in re- 
cent work (Ueda et al. 2003), and folded the number counts 
predicted for a limiting flux — 8 x 10~ 15 erg s _1 ctrT 2 

through the EPIC response function (for simplicity, we used the 
pn on-axis response only). As input templates for this calcula- 
tion, we used power-law X-ray spectra with a fixed, intrinsic 
photon-index of T = 1.9 (e.g., Mateos et al. 2005) and mod- 
elled photoelectric absorption and Compton reflection (Gandhi 
& Fabian 2003), but did not include Compton-thick sources 
(Mi > 10 24 cm -2 ). Obscured AGN are defined as those with 
Nn > 10 22 cm -2 . The first plot in Fig. 8 shows that the domi- 
nant contributors to the sample of objects with HR>-0.2 (hard- 
spectrum sources) will be obscured AGN. Intrinsic photon- 
index variations may push some t/nobscured AGN into the high 
HR regime. Moreover, obscured AGN at high redshift are likely 
to have low HR values (due to positive ^-correction into the 
soft band), so this threshold definitely does not select all ob- 
scured AGN uniquely; but given the relatively-shallow depth 
of our survey, obscured AGN will still dominate this sample. 
Also note that our observed sample of hardness ratios in Fig. 6 
shows an increase of HR with decreasing flux, as is typically 
associated with obscured AGN in which the soft-band flux is 
depleted due to photo-electric absorption in the torus. 

Over the deepest region of our survey, we detect marginal, 
but positive correlation at the ~ 2 - 3<x level for the sample 
of hard-spectrum sources, at angular separations below ~ 100 
arcsec. Interestingly, the values of o> at these separations match 
well those inferred by Basilakos et al. (2004) and Yang et al. 
(2003) for the full hard band, which might suggest cosmic vari- 
ance of obscured AGN specifically. Since obscured AGN be- 
gin to dominate the source counts only at faint fluxes, we had 
to include sources with a less stringent detection significance 
(S/N>2) for this analysis. We can be confident that most of 
these are not spurious because a large fraction (85 per cent) 
also have associated 0.5-2 keV counterparts (see Fig. 6). The 
most suspect detections are the hardest, faint ones with no 0.5- 
2 keV counterparts, but the correlation signal is not adversely 
affected by removal of these sources (§ 3.4). Finally, we note 
that if this sample were dominated by spurious fluctuations, any 
intrinsic clustering signal would have been diminished, rather 
than enhanced as we observe, due to their Poisson nature. 

4.2.1 . Inversion of the Limber equation 

An estimate of the real space correlation function can be ob- 
tained by inversion of the Limber equation that connects the 



true space correlation scale ro with the projected scale 9q (see, 
e.g., Peebles 1980; Wilman et al. 2003). For this, an estimate 
of the source redshift distribution and a model for the clus- 
tering evolution is required. We assume that the AGN in our 
survey follow the expected redshift distribution based on the 
Ueda et al. (2003) model as described above. Predictions can 
be made separately for sources that would be classified as either 
soft-spectrum, or hard-spectrum AGN, based on their count ra- 
tios, irrespective of whether they are intrinsically obscured or 
unobscured. This redshift distribution is shown in Fig. 8. Hard- 
spectrum sources peak at z = 0.7, with a detectable tail ex- 
tending out to z ~ 2 at least. Though the power-law model of 
§ 3.4 is not a good model to fit, assuming the best-fit parame- 
ters found therein, and inverting the Limber equation, we find 
r = 6(±3)/r' Mpc [H = 100k km s -1 Mpc -1 ] by assuming 
co-moving clustering evolution and the redshift distribution of 
hard-spectrum sources in Fig. 8. This scale is similar to that 
of local, optically-selected galaxies (Davis & Peebles 1983) or 
that of optically-selected QSOs at z ~ 1 (Croom et al. 2001). 
On the other hand, it is also consistent at the ~ 2<x level with the 
stronger clustering of extremely-red objects and powerful radio 
galaxies (e.g., Rottgering et al. 2003 and references therein). 
Given the weakness of the contraints above, however, we defer 
further discussion on this until source follow-up is complete. 

4.3. Implications 

The typical luminosities of AGN probed in medium-deep sur- 
veys such as the XMM-LSS will be L X - my > 10 43 5 erg s" 1 
(e.g., Gandhi et al. 2004). A small correlation length is then 
consistent with X-ray selected Seyferts being unbiased tracers 
of structure. This is also in qualitative agreement with studies 
such as that of Waskett et al. (2005), who found no difference 
in the environments of AGN as compared to those of inactive 
galaxies. Wake et al. (2004) also arrived at a similar conclusion 
from a large sample of optically-selected AGN. 

On the other hand, if the auto-correlation of hard-spectrum 
X-ray sources is indeed larger, it would suggest that obscured 
AGN are preferentially associated with higher density peaks in 
the underlying matter distribution; the large amount of gas and 
dust present in the environment not only triggers AGN activity, 
but also hides the AGN itself. Galaxy mergers could provide 
the gas necessary to achieve both (e.g., Hopkins et al. 2005), 
as is the case for the population of Ultra Luminous Infra Red 
Galaxies. In fact, assuming that AGN found in X-rays are cor- 
related with powerful-infrared, obscured starbursts known to 
peak at z ~ 0.7 (e.g., Chary & Elbaz 2001), models can be 
constructed to explain the X-ray background spectrum as well 
as X-ray source counts (Franceschini et al. 2002; Gandhi & 
Fabian 2003). The mechanism that could drive the gas and dust 
from the large-scale environment to the scales of galactic nu- 
clei remains unclear, however. Assuming a median redshift of 
0.7, angular separations of 50-100 arcsec (the first two bins 
of the ACF in Fig. 7) correspond to projected physical sepa- 
rations of ~ 350 - 700 kpc, too large for these systems to be 
bound mergers in advanced stages. Associations with filaments 
and groupings in the large scale matter distribution is a possi- 




Fig. 8. Predictions of the distribution and classification of AGN as soft- and hard-spectrum sources in the XMM-LSS. (Left) The plot shows the 
expected hardness ratio histogram of AGN with fiuxes> 8 x 10~ 15 erg s' 1 cirr 2 in the 2-10 keV band, assuming the XLF and A'h distribution 
of Ueda et al. (2003). The EPIC-pn camera response with the Thin filter is assumed. The fraction of obscured AGN (N H > 10 22 cnT 2 ) is 
shaded in grey and their relative percentage of all sources in each bin is labelled. While most (> 50 per cent) of the detected sources have 
very low values of HR (and lie in the first bin), the plot shows that at high HR values, obscured AGN completely dominate. (Right) The plot 
shows the expected redshift distribution of AGN classified as soft-spectrum and hard-spectrum sources above the same flux limit, assuming the 
luminosity-dependent density evolution model of Ueda et al. Hard-spectrum sources peak at z ~ 0.7. 



bility. Interactions with any enhanced density of minor galaxies 
also associated with the large scale structure could provide the 
necessary torques to trigger gas inflows towards the nucleus, 
via bars or bound mergers, for instance (Shlosman et al. 1989; 
de Robertis et al. 1998). 

In contrast, more recent X-ray luminosity function deter- 
minations and background synthesis models (over shallower 
areas of sky at the relevant flux limits, but with extensive 
redshift coverage) do not require separate evolutionary and/or 
formation scenarios for obscured and unobscured AGN (e.g., 
Treister et al. 2004; Ueda et al. 2003): these would predict no 
difference in the comparative correlation of obscured and unob- 
scured AGN. Given the low overall significance of our detected 
correlation signal for hard-spectrum sources, we can certainly 
not rule out this possibility. Indeed, in a recent spatial cluster- 
ing analysis, Yang et al. (2006) found no difference in the cor- 
relation properties of hard- and soft-spectrum AGN (see also 
Gilli et al. 2005), in contrast to their previous results in angular 
coordinates for the same sample. This difference might be the 
result of dissimilar redshift distributions of the two classes of 
sources; whatever the reason, it underscores the need for com- 
plete studies over larger areas of sky. 

Other on-going works which will be able to measure an- 
gular correlations include the AXIS (Carrera et al. 2006) and 
the COSMOS surveys (Hasinger et al. 2006). The forthcoming 
expansion of the XMM-LSS survey itelf, along with its deep, 
multi-wavelength follow-up will provide good constraints on 
the spatial distribution and clustering of AGN as well as clus- 
ters. With an area of 10 deg 2 , we expect to find at least 1000 
AGN above a 2-10 keV flux limit of 8 x 10~ 15 erg s -1 cnT 2 
(corresponding to our S/N>3 criterion). Additionally, the num- 
bers of obscured AGN detected should double compared to our 
present sample. The uncertainties on clustering statistics will 



decrease by a further factor of ~2, giving a proper determina- 
tion of the slope and scale length and a much better account of 
the cosmic variance. 
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